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Long chains of the HP lattice protein model are studied by the Multi-Self-Overlap Ensem- 
ble(MSOE) Monte Carlo method, which was developed recently by the authors Q. MSOE success- 
fully finds the lowest energy states reported before for sequences of the chain length N = 42 ~ 100 
in two and three dimensions. Moreover, MSOE realizes the lowest energy state that ever found in a 
case of N = 100. Finite-temperature properties of these sequences are also investigated by MSOE. 
Two successive transitions are observed between the native and random coil states. Thermodynamic 
analysis suggests that the ground state degeneracy is relevant to the order of the transitions in the 
HP model. PACS numbers: 87.15.By, 87.10. +e, 02.70.Lq 
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Protein folding g is one of the most interesting prob- 
lems in biological science. How amino acid sequences 
code their native structures, functions, and thermody- 
namical behavior? To answer these questions, it is useful 
to study simplified models. Among them, lattice protein 
models have been playing important roles in theoret- 
ical studies of protein folding. For relatively short chains, 
the exact enumerations of all the conformations are possi- 
ble. In dealing with longer chains, however, we encounter 
difficulty, because dynamical Monte Carlo methods are 
not efficient for lattice protein models ||. One of the 
reasons why conventional Monte Carlo algorithms are so 
slow is a rugged free-energy landscape at low temperature 
due to the heterogeneity of the interactions. For over- 
coming slow dynamics due to the ruggedness in other 
random systems such as the spin glasses, extended en- 
semble Monte Cairo methods have been developped: the 
multicanonical algorithm |||7), the simulated tempering 
algorithm || , the exchange Monte Carlo algorithm ||,|l0| , 
and so on. In case of lattice polymers, however, topo- 
logical barriers due to the self-avoiding condition cause 
further difficulty, because they are independent of energy 
and thus cannot be cured by the use of the abovemen- 
tioned extended ensemble methods. In fact, a work by 
Shakhnovich et al. [[ll) suggested that faster dynamics is 
attained by relaxing the self-avoidingness. 

Recently, we proposed a new Monte Carlo method 
called Multi-Self-Overlap Ensemble (MSOE) @] by gen- 
eralizing the idea of the multicanonical ensemble. In 
this method, the self-avoiding condition is systemati- 
cally weakened. While the resultant ensemble contains 
some portions of non-physical self-overlapping configura- 



tions, the correct canonical ensemble of the physical con- 
formations can be reconstructed through the histogram 
reweighting procedure. We observed a considerably fast 
relaxation for a simple lattice heteropolymer compared 
with the conventional multicanonical ensemble method. 

In this letter, we apply MSOE to long chains of the 
HP lattice protein model Q in two and three dimensions. 
We demonstrate that the low energy states are found suc- 
cessfully and thermodynamic properties are calculated at 
the same time. It should be noted that the principle of 
MSOE is independent of specific choices of the interac- 
tions between monomers. Although we restrict ourselves 
to the HP model in the following, extentions to models 
with other interactions, such as the one with Miyazawa- 
Jernigan contact matrix fl2|| , are straightfoward. 

Let us explain MSOE algorithm briefly. First, we de- 
fine the degree of violation of self-avoidance V as 



V 



E 
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where rii is the number of monomers on a lattice point i, 
and G denotes a set of lattice points which are occupied 
by at least one monomer. For self-avoiding conforma- 
tions, V — 0. We assume that the definition of the origi- 
nal energy function E can be extended to conformations 
with self-overlaps in a reasonable manner. Then, we de- 
termine the weight factor P g cx exp[— g(E, V)] through 
plcliminary runs as in the case of the conventional mul- 
ticanonical ensemble [|J, so that the bivariate histogram 
of (E, V) is sufficiently flat ina prescribed range 
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TABLE I. The values ol the lowest energy reported by several authors for three sequences of 2D HP model. 
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E m in < E < E max , < V < V max . We set E min to 
a value which is definitely lower than the ground state 
energy, and set V ma x to a value iV/10 ~ N/5. We actu- 
ally use the entropic sampling method in this paper. 
After the weight factor g(E, V) is determined, a mea- 
surement run is performed. The canonical averages at 
temperature T is calculated according to the histogram 
reweighting formula, 
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where I\ represents a conformation at the ith Monte 
Carlo step and the summations are taken only over the 
self- avoiding conformations. 

How MSOE works? In Fig. 1, observed transitions 
during a MSOE simulation are plotted on the (E, V) 
plane (This example is taken from a simulation for the 
2D N — 64 sequence shown in Table I, of which we 
will discuss below). We can clearly see bridges (or 
paths) between self-avoiding low energy states where self- 
overlapping states are used as stepping stones. That 
is, while no direct transition is seen between (E, 0) and 
(E',0), there are paths that utilize the states with non- 
zero V as itermediate states. The existence of such paths 
are a key feature of MSOE, which effectively facilitates 
the relaxation. The idea behind MSOE can also be ex- 
tended to off lattice models. For example, one can con- 
sider the hard core repulsive part of the energy as an off- 
lattice counterpart of the self-avoiding conditions. Other 
types of bivariate extention of the multicanonical algo- 
rithm for off-lattice protein models have also been dis- 
cussed by Higo et al. p3| . They, however, did not in- 
corporate unphysical conformations for the purpose of 
attaining fast relaxations. MSOE also shares some ideas 
in common with the ghost polymer procedure by Wilding 
and Miiller for dense polymer systems Q. 



FIG. 1. Observed transitions on the (E, V) plane during a 
MSOE run of the N = 64 HP sequence in Table I. Solid 
lines connecting two states represent the transitions be- 
tween them. Only a small part of the entire (E, V) plane 
near the ground state is shown. 

Let us discuss the results of simulations for the HP 
protein model Q), in which a protein consists of a self- 
avoiding chain on a lattice with two types of amino acids: 
H (hydrophobic) and P (Polar). The energy E of a chain 
conformation is determined only by the number of H- 
H contacts h as E — — \e\h, where |e| is a positive con- 
stant (we measure the energy in the unit of |e| hereafter). 
In MSOE, we use the same definition of energy as the 
original one also for self-overlapping conformations. In 
principle, MSOE can be implemented with arbitrary dy- 
namics. We use the following elementary moves in this 
paper: (l)Jacknife move, (2)One bead flip, (3)Pivot op- 
erator. A description of the move (1) is given in ref. Jlj] 
and the moves (2) and (3) are illastrated in ref. All 
CPU times quote below refer to PCs with 500MHz DEC 
21164A chip. 

First we consider a two dimensional HP lattice protein 
with the chain length N — 64 in Table I, several meth- 
ods so far [p~6|— p~9| . For this sequence, the ground state 
energy is believed to be E = —42; one of the ground state 
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conformations is shown in Fig. 5 of ref. [|l9|. We calcu- 
lated the weight factor within 30 hours with E m i n — — 60 
and V m ax = 10. Figure 2 shows the marginal distribu- 
tion of (E, V) obtained by the measurement run. The 
marginal distribution is sufficiently flat including the 
state (—42,0), and we can caluculate thermodynamic 
quantities of the sequence at any temperature using it. 
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FIG. 2. Bivariate histogram H (E, V) obtained by a mea- 
surement run of the N = 64 HP sequence shown in Table 
I. 



Consider another sequence of N = 100 shown in the 
second row of Table I. Bastolla et al. 19 have found 
conformations with E — —49 (E — —48 without the 
abovementioned assumption on the ground state con- 
formations). On the other hand, we found conforma- 
tions with E = —50 within 50 hours by MSOE taking 
Emin = —65 and V ma x = 10. Figure 3 shows a typical 
conformation with E = —50. This conformation con- 
tains some non-bonded HP neighbor pairs. Thus, it can 
never be found when these pairs are not allowed. Unfor- 
tunately, we have not yet obtained the weight factor that 
gives a sufficiently flat distribution of (E, V). 




The values of the lowest energy found by several meth- 
ods are listed in Table I. Core directed chain Growth 
method(CG) (ll| has given the ground states. It is, how- 
ever, specialized in the search of the ground state, and 
thus cannot be used for thermodynamics. Bastolla et 
al. also have found these states (E — —42) by pruned- 
enriched Rosenbluth method(PERM) [[l9| with an as- 
sumption that the ground states do not contain any non- 
bonded HP neighbor pair (the lowest energy obtained 
without this assumption was E = —40). Such an ad- 
ditional assumption will cause uncontrollable biases in 
calculations of finite temperature properties. Moreover, 
as we will see later, there are examples in which low en- 
ergy states does not satisfy this assumption. Then, as far 
as we know, MSOE is currently the only method which 
can calculate finite temperature properties as well as the 
low energy states of this sequence. 

We calculated the specific heat and the gyration ra- 
dius. We found a peak at T ~ 0.55 and a shoulder a 
T ~ 0.4 in the specific heat curve. The shoulder at 
T = 0.4 corresponds to the transition between the ground 
states and compact globule states, which is a first-order- 
like transition since the energy distribution becomes bi- 
modal. The peak at T — 0.55, on the other hand, is the 
transition between compact globule states and the ran- 
dom coil; In this case the energy distribution is unimodal, 
that is, the transition is second-order-like. 



FIG. 3. A typical conformation with E = —50 of the 
second sequence in Table I. This value of the energy is 
reached for the first time by MSOE. 

Next, we discuss a three dimensional example. Yue 
and Dill [Q have given a sequence of the chain length 
N = 42, whose ground states are 4-fold degenerate and 
their conformations resemble the parallel /3-helix found 
for Pectate Lyase C pi] . The ground state energy has 
been calculated exactly as E = —34 by CHCC method. 
As far as we know, no attempt have been reported 
so far to calculate themodynamic properties of this se- 
quence. We appleid MSOE to this sequence. By taking 
Emin — —50 and V max — 8, we successfully obtained 
appropriate weight factor within 50 hours. Temperature 
dependence of the specific heat, the entropy and the gy- 
ration radius are shown in Fig. 4. We see two peaks in the 
specific heat curve. The peak at T ~ 0.28 and the other 
one at T ~ 0.52 correspond to the transition between the 
ground states and compact globule states (first-order- 
like) and one between the compact globule states and 
the random coil (second-order-like), respectively. The 
gyration radius is considerably small at T ~ 0.52. We 
also found that (not shown in the figure) the fluctuation 
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of the gyration radius has a peak at temperature signif- 
icantly higher than T ~ 0.52. These observations imply 
that the coil-globule transition takes place in two steps. 
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FIG. 4. Temperature dependence of the specific heat, the 
entropy, and the gyration radius for the /3-helix sequence. 

We made MSOE simulations for two more HP se- 
quences: another sequence of TV = 100 on the square 
lattice given in the third row of Table I and the N — 48 
sequence on the cubic lattice given as the sequence No. 9 
in ref. For both sequences, we easily obtained appro- 
preate weight factors down to the lowest energy states 
(E = —47 and E = —34, respectively), and found that no 
first-order-like transition takes place. Thus, not all the 
HP sequences exhibits first -order-like transition. The 
lowest energy states of these sequences are highly degen- 
erated. On the other hand, the lowest energy states of 2D 
N — 64 sequence and the /3-helix sequence, both of which 
exhibit first-order-like native-globule transitions as we 
have seen above, have highly regular shapes and have 
low degeneracy. The above results imply that low de- 
generacy of the ground states is required for first-order- 
like native-globule transitions to take place in long HP 
chains. 

We also tested the convensional multicanonical algo- 
rithm for all the sequences studied above, but none of 
the lowest energy states found by MSOE could not be 
reached within 4-5 days. Thus, for such long chains 
as treated in the present study, the convensional mul- 
ticanonical algorithm is of no practical use . 

In summary, we applied the Multi-Self-Overlap En- 
semble (MSOE) method for long chains of the HP lat- 
tice protein model in two and three dimensions. For all 
five sequences we treated, MSOE successfully reached the 
lowest energy states reported so far. Especially for the 
sequence in the second row of Table 1 , we found the low- 
est energy states which any other methods have failed to 



find. We thus confirmed that MSOE is a powerfull tool 
for searching the ground states of lattice protein mod- 
els. Moreover, we explored the native-globule transitions 
at finite temperature by MSOE, and found that the de- 
generacy of the ground states is relevant for the order 
of the transition. MSOE is a quite general method so 
that it can be applied to any types of interactions. It 
can calculate correct thermodynamic properties within a 
moderate CPU time (although not fastest in some cases 
actually) as well as low energy states. Especially, once 
the proper weight factors are determined, we can calcu- 
late thermodynamic properties in a arbitrary wide range 
of temperature in a single run of MSOE. 
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